Managing and analyzing phylogenetic databases

نویسندگان

  • AKSHAY DEEPAK
  • David Fernández-Baca
  • Oliver Eulenstein
  • Xiaoqiu Huang
  • Pavan Aduri
  • Srikanta Tirthapura
چکیده

The ever growing availability of phylogenomic data makes it increasingly possible to study and analyze phylogenetic relationships across a wide range of species. Indeed, current phylogenetic analyses are now producing enormous collections of trees that vary greatly in size. Our proposed research addresses the challenges posed by storing, querying, and analyzing such phylogenetic databases. Our first contribution is the further development of STBase, a phylogenetic tree database consisting of a billion trees whose leaf sets range from four to 20000. STBase applies techniques from different areas of computer science for efficient tree storage and retrieval. It also introduces new ideas that are specific to tree databases. STBase provides a unique opportunity to explore innovative ways to analyze the results from queries on large sets of phylogenetic trees. We propose new ways of extracting consensus information from a collection of phylogenetic trees. Specifically, this involves extending the maximum agreement subtree problem. We greatly improve upon an existing approach based on frequent subtrees and, propose two new approaches based on agreement subtrees and frequent subtrees respectively. The final part of our proposed work deals with the problem of simplifying multi-labeled trees and handling “rogue” taxa. We propose a novel technique to extract conflict-free information from multi-labeled trees as a much smaller single labeled tree. We show that the inherent problem in identifying rogue taxa is NP-hard and give fixed-parameter tractable and integer linear programming solutions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparative Evaluation of Microarray-based Gene Expression Databases

Microarrays make it possible to monitor the expression of thousands of genes in parallel thus generating huge amounts of data. So far, several databases have been developed for managing and analyzing this kind of data but the current state of the art in this field is still early stage. In this paper, we comprehensively analyze the requirements for microarray data management. We consider the var...

متن کامل

BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computationand data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this w...

متن کامل

REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions o...

متن کامل

Efficiently Supporting Structure Queries on Phylogenetic Trees

With phylogenetics becoming increasingly important in biomedical research, the number of phylogenetic studies is increasing rapidly and huge mount of phylogenetic data has been generated and stored in databases. How to efficiently extract information from the data has become an important research problem. In this paper, we focus on a class of important queries on phylogenetic trees: structure q...

متن کامل

Amelioration of soils contaminated with radionuclides: Exploiting biodiversity to minimise or maximise soil to plant transfer

There is much current interest in managing and manipulating radionuclide transfer from soils to plants, either for ‘safe’ crops or for phytoextraction of radionuclide contaminated soils. Potentially there is biodiversity in plant uptake that might aid these efforts. We have established the effects of phylogeny (evolutionary history) on the uptake of numerous heavy metal and nutrient elements. H...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015